Home

Analysis

Webscraping Methodology

Exploratory Data Analysis

EDA (Exploratory Data Analysis)

Leaderboard EDA

Leaderboard Frequency Plot

  • Outputs the frequency of score ranges for the entirety of the leaderboard.
  • Interactive user can choose ranges of values.
plot_lb_range_interactive(lb_df, "Score", 0, 4000000, 1000000)
Leaderboard Interactive Histogram

Leaderboard Interactive Histogram

Profile EDA

Frequency Plots by Profile

  • Can select the profile and temporal metric
  • Note: This Shiny app won’t display in the self-contained HTML file. To interact with the app, you can run the RMD document in an R Markdown viewer or in the RStudio IDE.

Metrics EDA

Churned Histogram

  • Most users from this sample, approx. 75% not churned by this definition
# Plot histogram of churned with different colors for TRUE, FALSE, and NA
ggplot(metrics_df, aes(x = churned, fill = factor(churned))) +
  geom_bar(color = "white") +
  scale_fill_manual(values = c("darkgreen", "darkred", "gray")) +
  labs(title = "Churned Histogram (365 Days Since Last Achievement)", x = "Churned Status", y = "Count")

Longest Streak Histogram

  • Most users have 4 or 5 days as their longest streak.
  • This sample approximates a roughly normal distribution.
ggplot(metrics_df, aes(x = longest_streak, fill = factor(longest_streak))) +
  geom_bar(color = "white") +
  labs(title = "Streak Histogram", x = "Longest Streak (in Days)", y = "Count")

Game Time Box Plot

  • Most players hover in the thousands of hours with several outliers above 10,000
  • This plot only shows Xbox One and Series X|S titles.
# Create the box plot for game time
ggplot(metrics_df, aes(x = "", y = total_game_time_minutes / 60, fill = "Game Time")) +
  geom_boxplot(width = 0.5, position = position_dodge(width = 0.9), color = "black", outlier.color = "darkred", outlier.shape = 16, outlier.size = 3) +
  labs(x = "", y = "Game Time (Hours)", fill = "") +
  scale_fill_manual(values = "#FF7F00") +
  theme(legend.position = "top", legend.title = element_blank()) +
  scale_y_continuous(labels = scales::comma) +
  coord_flip()

App Time Box Plot

  • We filter out 138 values of zero for users who don’t use apps on Xbox.
  • Of the 62 players who use apps on Xbox, most hover at or below 2,000. This suggests that the users who do have significant app time on their profile use Xbox for the apps tracked.
  • This plot only shows Xbox One and Series X|S titles.
# Create the box plot
ggplot(metrics_df[metrics_df$total_app_time_minutes > 0,], aes(x = "", y = total_app_time_minutes / 60, fill = "App Time")) +
  geom_boxplot(width = 0.5, position = position_dodge(width = 0.9), color = "black", outlier.color = "darkblue", outlier.shape = 16, outlier.size = 3) +
  labs(x = "", y = "App Time (Hours)", fill = "", caption = paste("Number of Zero Values Filtered Out:", sum(metrics_df$total_app_time_minutes == 0))) +
  scale_fill_manual(values = "#1F78B4") +
  theme(legend.position = "top", legend.title = element_blank()) +
  scale_y_continuous(labels = scales::comma) +
  coord_flip()

Game vs App Time Scatter Plot

  • Most players don’t have any logged time into apps regardless of game time. This suggests from this sample most players engage in app content outside of Xbox.
ggplot(metrics_df, aes(x = total_game_time_minutes / 60, y = total_app_time_minutes / 60, color = total_app_time_minutes / 60)) +
  geom_point() +
  labs(x = "Total Game Time (Hours)", y = "Total App Time (Hours)", color = "Total App Time (Hours)") +
  scale_color_gradient(low = "blue", high = "red") +
  ggtitle("Total Time: Game vs App (Hours)") +
  scale_x_continuous(labels = scales::comma) +
  scale_y_continuous(labels = scales::comma)